Index-Supported Similarity Search Using Multiple Representations
نویسندگان
چکیده
Similarity search in complex databases is of utmost interest in a wide range of application domains. Often, complex objects are described by several representations. The combination of these different representations usually contains more information compared to only one representation. In our work, we introduce the use of an index structure in combination with a negotiation-theorybased approach for deriving a suitable subset of representations for a given query object. This most promising subset of representations is determined in an unsupervised way at query time. We experimentally show how this approach significantly increases the efficiency of the query processing step. At the same time the effectiveness, i.e. the quality of the search results, is equal or even higher compared to standard combination methods.
منابع مشابه
Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines
Vector representations and vector space modeling (VSM) play a central role in modern machine learning. In our recent research we proposed a novel approach to ‘vector similarity searching’ over dense semantic vector representations. This approach can be deployed on top of traditional inverted-index-based fulltext engines, taking advantage of their robustness, stability, scalability and ubiquity....
متن کاملPP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search
We present the Permutation Prefix Index (PP-Index), an index data structure that allows to perform efficient approximate similarity search. The PP-Index belongs to the family of the permutationbased indexes, which are based on representing any indexed object with “its view of the surrounding world”, i.e., a list of the elements of a set of reference objects sorted by their distance order with r...
متن کاملLearning Binary Codes For Efficient Large-Scale Music Similarity Search
Content-based music similarity estimation provides a way to find songs in the unpopular “long tail” of commercial catalogs. However, state-of-the-art music similarity measures are too slow to apply to large databases, as they are based on finding nearest neighbors among very high-dimensional or non-vector song representations that are difficult to index. In this work, we adopt recent machine le...
متن کاملEfficient Similarity Search on Vector Sets
Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging, computer aided design and many others. Whereas most of the existing similarity models are based on feature vectors, there exist some models which use very complex object representations such as trees and graphs. A promising way be...
متن کاملIncremental All Pairs Similarity Search for Varying Similarity Thresholds with Reduced I/O Overhead
All Pairs Similarity Search (APSS) is the problem of finding all pairs of records with similarity scores above a specified threshold. Incremental All Pairs Similarity Search (IAPSS) is the problem of performing APSS multiple times over the same dataset by varying the similarity threshold. This problem is ubiquitous in many real-world systems like search engines, online social networks, and digi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008